Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB) by ibarrajo · Pull Request #1001 · openai/parameter-golf

ibarrajo · 2026-03-28T03:58:32Z

Summary

Three approaches tested, all rule-compliant. Best legal result: 1.1188 BPB (s_0 TTT only).

Previous PR #991 was closed because TTT re-scored tokens after training. This submission reports only the legal s_0 score. All GPTQ calibration runs within 600s training budget.

Approach	val_bpb	Notes
A (#569 VRL+GPTQ, int5, no TTT)	1.1317	int5 penalty on d=512
B (#576 d=576 int5, no TTT)	1.1249	Strong base
B + legal s_0 TTT	1.1188	Score-first only, no re-eval
C (GEPA int5 + TTT)	N/A	Artifact 16.3MB over limit

Lessons learned

TTT re-scoring is illegal — only cumulative s_0 from first pass counts
int5 penalty on d=512: +0.014 BPB vs int6
Legal s_0 TTT: -0.006 BPB improvement
GPTQ must be within 600s training budget — our script reserves time and asserts

Rule compliance

GPTQ calibration within training budget (assert: train+gptq < 600s)
Artifact < 16MB (assert in code)
Eval < 600s (assert in code)
TTT reports s_0 only — NO re-scoring after training
No val tokens in artifact

Based on PRs #569, #576, #505. Submitted as non-record data points.

🤖 Generated with Claude Code

Approach A (openai#569 int5 no TTT): 1.1317 — int5 penalty too high on d=512 Approach B (openai#576 d=576 int5 + legal s_0 TTT): 1.1188 — best legal result Approach C (GEPA int5 + TTT): artifact over 16MB Key lesson: TTT re-scoring is illegal (PR openai#991 closed for this). Only s_0 cumulative first-pass score is legal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001

Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:submission/non-record-approaches

ibarrajo commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibarrajo commented Mar 28, 2026

Summary

Lessons learned

Rule compliance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant